Learn CUDA programming for NVIDIA Hopper GPUs. You will learn to build efficient WGMMA pipelines and leverage Cutlass optimizations to perform the massive matrix multiplications that power modern AI. Beyond single-chip performance, the curriculum covers multi-GPU scaling and NCCL primitives necessary for training trillion-parameter models. To get the most out of these lessons, you should have a foundational grasp of C++ syntax and linear algebra, particularly how matrices are tiled and multiplied.
- Course website:
- Course repo:
- X:
- GitHub Sponsors:
✏️ Developed byPrateek_Shukla
❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning:
0:00:00 Course Introduction
0:07:27 Table of Contents & Course Overview
0:23:30 LESSON 1 — H100 Hopper GPU Architecture
0:25:47 H100 Specifications: HBM3, Bandwidth & Power
0:26:22 Tensor Cores Overview
0:27:18 Tensor Memory Accelerator (TMA)
0:34:44 Transformer Engine
0:34:58 L2 Cache Architecture
0:35:21 GPCs, TPCs & SM Layout
0:37:00 Thread Block Clusters
0:46:22 Distributed Shared Memory
0:52:44 SM Sub-Partitions (SMSPs)
0:54:01 Warp Schedulers & Dispatch Units
1:02:37 Shared Memory & Data Movement
1:12:20 Occupancy
1:32:49 LESSON 2 — Clusters, Data Types, Inline PTX & Pointers
1:32:57 Thread Block Clusters Programming
1:42:11 Configuring Cluster Dimensions
1:48:08 Inline PTX Assembly
1:59:31 State Spaces
2:06:01 Data Types in PTX
2:07:16 Generic Pointers
2:09:59
|
"Oh, so this is what people mean by 'foc...
本日は世界一わかりやすいSkillsの教科書についてお話させて頂きました! ぜひ...
Download your free Python Cheat Sheet he...
What has learning to code taught you...?...
Download your free Python Cheat Sheet he...
What goes into managing a major project?...
Stop code hallucinations from slowing do...
Bangladesh's thriving developer scene is...
Today Quincy Larson interviews Mark Maho...
本日はターミナル入門についてお話させて頂きました! ぜひご視聴ください! 🧑...
Google Play's 12-tester, 14-day requirem...
Download your free Python Cheat Sheet he...
Download your free Python Cheat Sheet he...
Learn CUDA programming for NVIDIA Hopper...